spec: voice input hotkey picker — Caps Lock, F-keys, custom (#9869)#10127
spec: voice input hotkey picker — Caps Lock, F-keys, custom (#9869)#10127lonexreb wants to merge 5 commits intowarpdotdev:masterfrom
Conversation
…ev#9869) Adds product.md + tech.md for issue warpdotdev#9869: the voice input hotkey picker in Settings > Agents > Voice exposes a fixed list of 10 modifier/function keys, missing Caps Lock and individual F-keys (F1-F12) — the two most-requested options for push-to-talk in real ergonomic setups — and offering no escape hatch for any other key. Investigation: - Setting is VoiceInputToggleKey enum at app/src/settings/ai.rs:110. Closed enum with hand-written display_name, to_key_code, keystroke, and tooltip_message methods. - Underlying KeyCode at crates/warpui_core/src/platform/keyboard.rs already supports CapsLock and F1-F35; the enum is artificially narrower than the platform layer can handle. - Persistence is SyncToCloud::Never (intentional, per-device keyboards) — preserved. - Action dispatch at block_list_element.rs:3072 and alt_screen_element.rs:620 already speaks KeyCode, so new variants don't require dispatch-layer changes. Spec proposes: - Extend the enum with CapsLock, F1-F12, and a Custom(KeyCode) variant (single source of truth, justified against an override-field alternative). - Sectioned dropdown ("Modifier keys" / "Function keys" / "Other") to keep 22 entries scannable; existing modifiers stay at the top so existing users see no reorder. - Press-to-capture modal for "Custom key..." with a B5 reject-list (app-quit shortcut, Enter/Escape/Tab/Backspace/Space) and an >800ms held-modifier carve-out. - Caps Lock on macOS uses press-to-toggle semantics (one tap on, next tap off) because macOS Caps Lock fires only on toggle events, not press/release. Discord and Krisp both take this approach. - Validation reset for hand-edited TOML with unbindable values, with a one-time toast. Test plan covers ten unit cases (TOML round-trip, display, key mapping, toggle-mode, validation reset) and three integration tests (modal capture flow, Caps Lock toggle, F-key hold-to-talk). Four open questions for maintainer review on SettingsValue derive support for payload variants, dropdown section heading constructors, the macOS Caps Lock decision, and KeyCode display exposure.
|
I'm starting a first review of this spec-only pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This spec adds product and technical direction for expanding the voice input hotkey picker with Caps Lock, F-keys, and a custom press-to-capture path. The scope aligns with GH-9869, but several requirements currently contradict each other or omit implementation details needed to avoid a broken follow-up implementation.
Concerns
- Product and tech specs disagree on whether macOS Caps Lock is hold-to-talk or tap-to-toggle.
- The chosen
Custom(KeyCode)model needs a concrete replacement for the current finite enum iteration and a settings-file serialization plan that works withKeyCode. - The invalid TOML reset/toast design does not explain how validation sees values that fail deserialization.
- Shortcut rejection needs full keystroke/modifier semantics, not just a
KeyCodereject list.
Security
- Telemetry should explicitly exclude raw capture attempts and rejected/cancelled keypresses, recording only the final saved hotkey value.
Verdict
Found: 1 critical, 4 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| ### B1 — Caps Lock as a first-class option | ||
|
|
||
| Caps Lock appears in the dropdown on all platforms (macOS, Windows, | ||
| Linux). When selected, holding Caps Lock activates voice input and |
There was a problem hiding this comment.
🚨 [CRITICAL] This says Caps Lock is hold/release on all platforms, but tech.md requires macOS press-to-toggle; make the product behavior and acceptance criteria explicitly platform-specific so implementation has one source of truth.
| CapsLock, // NEW | ||
| F1, F2, F3, F4, F5, F6, // NEW | ||
| F7, F8, F9, F10, F11, F12, // NEW | ||
| Custom(KeyCode), // NEW |
There was a problem hiding this comment.
Custom(KeyCode) to the existing enum breaks the current EnumIter/all_possible_values() dropdown flow and requires a SettingsValue path for KeyCode; specify an explicit predefined-option list plus the manual Custom settings-file handling.
| ``` | ||
| For `Custom`: | ||
| ```toml | ||
| voice_input_toggle_key = { custom = "F19" } |
There was a problem hiding this comment.
{ kind = "custom", key = "f19" } fallback and product.md's lowercase { custom = "f19" }; choose one canonical format and make the tests assert it.
| `AISettingsPageAction::SetVoiceInputToggleKey(VoiceInputToggleKey::Custom(key_code))`. | ||
| - On Cancel or Escape, closes without dispatch. | ||
| - The modal's reject-list (B5) is constructed at open time from: | ||
| - The currently-bound app-quit shortcut (read from key bindings). |
There was a problem hiding this comment.
KeyCode; specify tracking modifiers and comparing a full Keystroke so Cmd+Q is rejected without banning bare Q.
| deserializer falls back to `Default` today. We need to additionally | ||
| fire a one-time toast. | ||
|
|
||
| Implementation: add a `validate()` method on `VoiceInputToggleKey` |
There was a problem hiding this comment.
voice_input_toggle_key = "enter" if deserialization has already failed/defaulted; specify where the raw parse failure is surfaced so A6's one-time toast actually fires.
|
|
||
| ### B8 — Telemetry: which key was bound | ||
|
|
||
| When the user changes the voice input hotkey, fire the existing |
There was a problem hiding this comment.
💡 [SUGGESTION] [SECURITY] Clarify telemetry fires only after Confirm/save and records only the final enum/KeyCode label, not raw key events, rejected keys, or cancelled captures.
Bot review (warpdotdev#10127) identified 1 critical + 4 important + 1 security suggestion concerns: - Critical: product and tech specs disagreed on whether macOS Caps Lock is hold-to-talk or tap-to-toggle. - Important: Custom(KeyCode) breaks the existing EnumIter / all_possible_values() flow with no concrete replacement. - Important: TOML format for Custom variant was specified two ways (`{ kind = "custom", key = "f19" }` and `{ custom = "F19" }`). - Important: app-quit reject-list compared bare KeyCode, not Keystroke — would either ban bare Q or fail to reject Cmd+Q. - Important: validation hook can never see TOML values that fail parsing, so A6's toast would never fire. - Suggestion: telemetry must explicitly exclude raw capture attempts and rejected/cancelled keypresses. Fixes: - B1 split into platform-specific semantics: macOS press-to-toggle (one tap on, next tap off — matches macOS Caps Lock event model and Discord/Krisp UX), Windows/Linux hold-to-talk. A4 acceptance criterion split into A4-mac and A4-win-lin. - Added "Replacing the EnumIter / all_possible_values() flow" section with a concrete predefined_options() function that returns the curated unit variants only. Custom(KeyCode) is constructed at capture time, not iterated. EnumIter derive is dropped. - Added "Settings serialization" section justifying the hand-written Serialize/Deserialize impl over a one-off macro extension. - Canonicalized the TOML format to `{ custom = "f19" }` with KeyCode payload as snake_case lowercased Serde-default. Tests assert exact strings. - Reject-list comparison now uses full Keystroke (key + modifiers) for app-quit and the existing voice-input shortcut. Static structural rejects (Enter/Escape/Tab/Backspace/Space) remain bare-key. The bare-modifier rule is preserved via the >800ms held- modifier carve-out. - Settings-reset (A6) split into two paths: raw parse failure (intercepted in the hand-written Deserialize impl, surfaced via a side-channel Mutex<Option<String>>) and parsed-but-unbindable (post-parse validate() check). Both fire the same toast. - Added explicit telemetry privacy guardrails to B8: only the final saved value is emitted, never raw captures, rejected keys, or cancelled captures; respects global opt-out. Each correction includes a "Correction (review warpdotdev#10127)" callout so reviewers can cross-reference the bot's concerns.
|
Pushed corrections addressing all 6 bot review concerns (1 critical + 4 important + 1 security):
Each correction includes a "Correction (review #10127)" callout in the spec for cross-reference. Ready for re-review. /oz-review |
There was a problem hiding this comment.
Overview
This spec covers the requested Caps Lock, F-key, and custom-key voice input picker behavior with product and technical detail. The overall scope aligns with the linked issue, and the privacy treatment for captured keys is explicit.
Concerns
- Acceptance criteria still require hold/release semantics for every bound key, contradicting the macOS Caps Lock tap-to-toggle behavior defined earlier.
- The technical spec's voice-input self-conflict rule would prevent users from changing an existing custom hotkey.
- The proposed invalid-TOML recovery path relies on a field deserializer mutating the containing settings object, which is not a feasible Serde design as written.
- Product and tech specs disagree on whether invalid/unbindable settings should leave TOML unchanged or rewrite it.
Verdict
Found: 1 critical, 3 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| A4. Holding the bound key activates voice input; releasing | ||
| deactivates it. Pixel-equivalent to today's behavior for the | ||
| existing modifier keys. |
There was a problem hiding this comment.
| - **Currently-bound voice-input shortcut:** allow re-confirming | ||
| the same key (so the user can re-open and confirm without | ||
| error), but reject if the user picks a different value than | ||
| the existing one for any reason. Same `Keystroke` comparison. |
There was a problem hiding this comment.
🚨 [CRITICAL] This rejects any different custom key, so the modal cannot actually change an existing voice hotkey. The voice-input self-conflict rule should reject only recursive activation while capture is open or the exact conflicting shortcut, not every value different from the current setting.
| 2. **On parse failure, return a special `Default` value tagged with | ||
| the original raw string** instead of erroring out. The tag is | ||
| stored in a side-channel: a `parking_lot::Mutex<Option<String>>` | ||
| on `AISettings` named `voice_input_toggle_key_raw_parse_failure`. |
There was a problem hiding this comment.
Deserialize impl cannot write into a side-channel on the containing AISettings instance while that instance is being constructed. Specify a feasible mechanism, such as manual AISettings deserialization that captures the raw value or a wrapper type returned from parsing.
| - Reset the in-memory value to `None`. | ||
| - Re-write the TOML on disk with the corrected value. |
There was a problem hiding this comment.
| 1. Does the `SettingsValue` derive macro support enum variants with | ||
| payloads (`Custom(KeyCode)`), or do we need a manual | ||
| `Serialize`/`Deserialize` impl? |
There was a problem hiding this comment.
💡 [SUGGESTION] This open question is already answered in the spec: the macro does not support payload variants and the recommendation is manual serialization. Remove it or rephrase it as a maintainer decision needing confirmation.
Round-2 bot review identified 4 remaining concerns:
- A4 acceptance criterion still required hold/release semantics
for every bound key, contradicting B1's macOS Caps Lock tap-to-
toggle that I added in round 1.
- The reject-list rule for the currently-bound voice-input
shortcut would have prevented users from changing their existing
custom hotkey — the literal opposite of the modal's purpose.
- Path 1 of the A6 reset flow proposed mutating a side-channel
Mutex on the containing AISettings struct from inside the field
Deserialize impl. That isn't feasible Serde — a field
deserializer doesn't have a reference to its container.
- Product and tech specs still disagreed on whether invalid TOML
is left unchanged or rewritten.
Fixes:
- A4 split into A4-win-lin (hold-to-talk) and A4-mac (Caps Lock
tap-to-toggle, all other keys hold-to-talk). The split mirrors
B1's existing platform breakdown.
- Reject-list: removed the "reject voice-input self-conflict"
rule. Same-key re-confirm is handled by accepting the capture
and short-circuiting the dispatch when the captured KeyCode
equals the current one. The dispatch handler at
block_list_element.rs:3072 only fires voice activation, never
re-binds, so there is no infinite-loop risk.
- A6 reset path rewritten with a feasible Serde design: the
storage type is a #[serde(untagged)] wrapper enum
VoiceInputTomlValue { Valid(VoiceInputToggleKey),
Invalid(toml::Value) } that captures both successful and failed
parses without erroring out. A single post-init pass on
AISettings::initialize drains the Invalid branch, logs, rewrites
the TOML on disk, and fires the toast. No side-channel mutation
from inside a deserializer.
- Path 2 (parsed-but-unbindable) folded into the same routine,
guaranteeing A6's toast fires once for any combination of TOML-
parse-failure and post-parse-validation-failure.
- product.md A6 updated to "TOML is rewritten" so it matches
tech.md.
Each correction includes a "Correction (re-review warpdotdev#10127)" callout.
|
Round-2 corrections pushed (commit 7387e88):
/oz-review |
There was a problem hiding this comment.
Overview
Reviewed the spec-only PR for GH-9869. The product and technical specs cover the intended Caps Lock, F-key, and custom-key expansion, including compatibility, platform-specific Caps Lock behavior, persistence, tests, and privacy guardrails.
Concerns
- The product and technical specs still disagree on whether the capture flow rejects the currently-bound voice-input shortcut.
- The technical spec does not define how a persisted
Custom(KeyCode)is displayed/selected in the dropdown once that value is outsidepredefined_options(). - The modal display requirements conflict with an unresolved open question about whether to show protocol-level
KeyCodespelling.
Verdict
Found: 0 critical, 2 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| The press-to-capture flow rejects (with an inline error message and | ||
| no save) any key that would brick the user's keyboard: | ||
| - The currently-bound app-quit shortcut. | ||
| - The currently-bound voice-input shortcut itself (no infinite loop). |
There was a problem hiding this comment.
| `Custom(KeyCode)` is **not** part of `predefined_options()` — it is | ||
| constructed by the press-to-capture modal at runtime. The dropdown | ||
| shows a synthetic "Custom key…" entry that opens the modal; once a | ||
| key is captured, the resulting `Custom(KeyCode)` is set as the | ||
| current value but does not appear in the predefined list. |
There was a problem hiding this comment.
Custom(KeyCode) becomes the current value but is absent from the item list, the spec does not define how the dropdown trigger, selected state, or reload path render it; specify whether to inject a selected Custom: F19 row, special-case the trigger label, or use another supported dropdown pattern.
| - Confirms the captured key by name and `KeyCode` (e.g. | ||
| "Selected: F19 (KeyCode::F19)"). |
There was a problem hiding this comment.
💡 [SUGGESTION] This requires showing KeyCode in the modal, while B7 and tech.md's open question lean toward friendly labels; resolve the modal copy so implementers do not ship different UX.
…ec (round 3) Round-3 bot review identified 2 important + 1 suggestion: - Product B5 still listed "the currently-bound voice-input shortcut itself" as a reject-list entry, contradicting the tech-spec round-2 fix that removed it (it would have prevented users from changing an existing custom hotkey). - Tech spec did not specify how the dropdown displays the *currently selected* value when that value is a Custom(KeyCode) outside predefined_options(). Without spec, the set_selected_by_index(None) path would silently fall back to index 0 — a data-loss bug. - Modal display open question still unresolved (whether to show protocol-level KeyCode spelling). Fixes: - B5 product reject-list updated to drop the voice-input self- conflict entry, matching tech.md round 2. - Added "Displaying a persisted Custom(KeyCode) in the dropdown" section: virtual "Custom: <friendly-name>" row prepended when the current value is a Custom not in predefined_options(); re-opens the modal pre-populated when clicked. No data loss on render. - Resolved open question 4: friendly display name primary, protocol-level KeyCode in hover tooltip only. Same friendly name on dropdown row and modal display. Each correction includes a "Correction (re-review warpdotdev#10127)" callout.
|
Round-3 corrections pushed (commit e8eca85):
/oz-review |
|
I'm re-reviewing this spec-only pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
Reviewed the product and technical specs for expanding the voice input hotkey picker to Caps Lock, F-keys, and custom single-key capture. The scope matches GH-9869 and the specs cover the main product behavior, persistence, UI, and test plan.
Concerns
- The product and technical specs still disagree about whether the captured
KeyCodeis shown inline in the modal or only in a hover tooltip. - The invalid-TOML wrapper design is not reconciled with the generated settings API that later examples and tests still rely on.
- One open question is stale after the serialization section already chooses the manual implementation path.
Verdict
Found: 0 critical, 2 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| - Confirms the captured key by name and `KeyCode` (e.g. | ||
| "Selected: F19 (KeyCode::F19)"). |
There was a problem hiding this comment.
KeyCode inline, but tech.md resolves this to friendly name inline with KeyCode only in a hover tooltip. Pick one product-visible behavior so implementation and tests don't diverge.
| // AISettings::voice_input_toggle_key is internally | ||
| // VoiceInputTomlValue, with a public accessor that returns | ||
| // VoiceInputToggleKey (mapping Invalid -> None on read). |
There was a problem hiding this comment.
Setting API as if voice_input_toggle_key.value() returns VoiceInputToggleKey and set_value() accepts VoiceInputToggleKey. Specify the exact setting type/Setting::Value integration or update the API expectations before implementation.
| 1. Does the `SettingsValue` derive macro support enum variants with | ||
| payloads (`Custom(KeyCode)`), or do we need a manual | ||
| `Serialize`/`Deserialize` impl? |
There was a problem hiding this comment.
💡 [SUGGESTION] This question is stale now that the settings serialization section recommends manual Serialize/Deserialize; either remove it or reframe it as a maintainer confirmation of that decision.
- Modal body display reconciled: friendly name only, KeyCode in hover tooltip. Earlier draft had inline KeyCode spelling that contradicted the round-3 Open Question 4 resolution. - Open Question 1 (SettingsValue derive macro) closed: committed to manual Serialize/Deserialize impl per round-2 Settings serialization section. Open Question 4 was already closed. - VoiceInputTomlValue wrapper API reconciled: wrapper is an internal implementation detail. All external accessors (set_value, value(), tests, dropdown reads, dispatch) continue to see the unwrapped VoiceInputToggleKey. After post-init the field is always Valid(...), with the unwrap happening in the accessor. Tests don't see the wrapper.
|
Round-4 corrections (commit ccbec29):
|
|
I'm re-reviewing this spec-only pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This spec covers the requested Caps Lock, F-key, and custom-key expansion for the voice input hotkey picker and includes product behavior, technical design, and validation coverage. The main direction is aligned with issue #9869, and the telemetry section includes appropriate privacy guardrails.
Concerns
- Product and technical specs still disagree on whether the custom capture modal shows the protocol
KeyCodeinline or only in a tooltip. - The current-implementation section says dispatch needs no changes for new variants, but the same tech spec later requires dispatch changes for macOS Caps Lock toggle mode.
- The corrupted-TOML reset design does not match the current settings framework's per-setting
SettingsValueload path, so A6 is not implementable as written. - The custom
KeyCodeTOML wording should avoid implying thatKeyCode's default Serde output is already snake_case.
Verdict
Found: 0 critical, 3 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| A new dropdown entry "Custom key…" opens a modal that prompts | ||
| "Press the key you want to use for voice input." The user presses | ||
| any single key. The modal: | ||
| - Confirms the captured key by name and `KeyCode` (e.g. |
There was a problem hiding this comment.
KeyCode inline, but tech.md later resolves that the KeyCode spelling appears only in a tooltip. Align the product behavior so implementation and tests do not disagree on user-facing copy.
| and the long tail. | ||
| - **Action dispatch:** `block_list_element.rs:3072` and | ||
| `alt_screen_element.rs:620` compare incoming `key_code` to the | ||
| configured one. No code changes needed there for new variants — |
There was a problem hiding this comment.
| ``` | ||
|
|
||
| For `Custom(KeyCode)` — TOML inline-table with one key, `custom`, | ||
| and the `KeyCode` value as its lowercased Serde-default |
There was a problem hiding this comment.
💡 [SUGGESTION] KeyCode currently derives Serde without rename_all, so default Serde output is not snake_case. Reword this to say the manual VoiceInputToggleKey settings-file implementation defines the snake_case mapping, rather than implying KeyCode already provides it.
| > while tech.md said it was rewritten. Resolved to **rewrite** so | ||
| > the toast fires once, not on every launch. | ||
|
|
||
| The TOML field uses a wrapper enum that captures both successful |
There was a problem hiding this comment.
AISettings is deserialized as a struct field, but settings are loaded per-setting through Setting::read_from_preferences and SettingsValue::from_file_value, which returns None and inhibits writes on parse failure before a post-init wrapper can inspect raw TOML. Specify the actual interception point, such as a custom settings-value sentinel or a settings read-path change, so A6 is implementable.
Summary
Spec for issue #9869 — the voice input hotkey picker in Settings → Agents → Voice offers a fixed list of 10 modifier/function keys, missing Caps Lock and individual F-keys (F1–F12), and providing no escape hatch for any other key.
Investigation
VoiceInputToggleKeyenum atapp/src/settings/ai.rs:110. Closed enum with hand-writtendisplay_name,to_key_code,keystroke, andtooltip_message.KeyCodeatcrates/warpui_core/src/platform/keyboard.rsalready exposesCapsLockandF1–F35; the enum is artificially narrower than what the platform layer can identify.SyncToCloud::Never(intentional — same person uses different keyboards on different devices). Preserved.block_list_element.rs:3072andalt_screen_element.rs:620already compares incomingkey_code: KeyCodeto the configured one, so new variants don't require dispatch-layer changes.What's in the spec
product.md— 8 testable behavior invariants (B1–B8), 6 acceptance criteria (A1–A6), explicit non-goals (chord hotkeys, per-tab/profile, cloud sync), and 5 risks with concrete TECH decisions to make.tech.md— picks the data-model approach with explicit reasoning, names every file and line range touched, and lays out 6 unit tests + 4 modal tests + 3 integration tests.Architectural choices
Custom(KeyCode)variant on the existing enum, not a separate override field. Single source of truth for "what key fires voice input"; avoids a "two settings disagree" failure mode.Test plan
voice_input_toggle_key = "alt_left"in TOML sees no behavior change after upgradevoice_input_toggle_key = "enter"shows toast and resets to NoneOpen questions for maintainer review
SettingsValuederive macro support enum variants with payloads (Custom(KeyCode)), or do we need a manualSerialize/Deserializeimpl?DropdownItem::heading()constructor for sectioning, or should this PR add one?KeyCodespelling (debug aid) or only the friendly display name?Closes (spec-only) #9869 — implementation PR will follow once spec direction is confirmed.